Invariant image object recognition using Gaussian mixture densities
نویسنده
چکیده
In this work, a statistical image object recognition system is presented, which is based on the use of Gaussian mixture densities in the context of the Bayesian decision rule. Optionally, to reduce the number of free model parameters, a linear discriminant analysis is applied. This baseline system is then extended with respect to the incorporation of invariances. To do so, we start by suitably multiplying the available reference images. This idea is then applied to the observations to be classified, too, yielding the novel ‘Virtual Test Data’ method, which has some desirable advantages over classical classifier combination approaches. Furthermore, global invariances are incorporated by using the so-called tangent distance. In this work, tangent distance is embedded into a statistical framework, which for instance leads to a modified, more reliable estimation of the mixture density parameters. Furthermore, tangent distance is extended to compensate not only for global, but also for local image transformations (distorted tangent distance). A large part of the experiments was performed on the well known US Postal Service standard corpus for handwritten digit recognition. Furthermore, the proposed classifier was successfully applied to the recognition of medical radiographs, red blood cells as well as to the Columbia University Object Image Library (COIL-20) and the Max-Planck Institute’s Chair Image Database. The obtained error rate of 2.2% on the US Postal Service corpus is the best error rate published so far on this particular data set. Zusammenfassung In dieser Arbeit wird ein statistisches Objekterkennungssystem für Bilder vorgestellt, welches auf der Verwendung von Gauß’schen Mischverteilungen im Kontext der Bayes’schen Entscheidungsregel beruht. Zur Reduktion der freien Modellparameter wird dabei optional eine lineare Diskriminanzanalyse verwendet. Dieses Basissystem wird dann um die Berücksichtigung von Invarianzen erweitert. Zu diesem Zweck werden zunächst die vorhandenen Trainingsdaten geeignet vervielfacht. Diese Idee wird dann auf zu klassifizerende Testdaten übertragen und liefert die neue ‘Virtual Test Data’ Methode, die einige Vorzüge gegenüber Methoden der Klassifikatorkombination aufweist. Weiterhin wird eine Berücksichtigung globaler Invarianzen durch die Verwendung der sogenannten Tangentendistanz erreicht. Diese wird in der vorliegenden Arbeit in einen statistischen Rahmen eingebettet, was unter anderem zu einer modifizierten, zuverlässigeren Schätzung der Mischverteilungsparameter führt. Außerdem wird die Tangentendistanz um die Berücksichtigung lokaler Bildtransformationen erweitert (distorted tangent distance). Ein Großteil der Experimente wurde auf dem bekannten US Postal Service StandardKorpus für die Erkennung handgeschriebener Ziffern durchgeführt. Außerdem wurde der vorgestellte Klassifikator erfolgreich angewandt auf die Klassifikation medizinischer Röntgenbilder, roter Blutzellen sowie auf die Columbia University Object Image Library (COIL-20) und die Chair-Image Database des Max-Planck Instituts. Die auf dem US Postal Service Korpus erzielte Fehlerrate von 2.2% ist dabei die bislang beste publizierte Fehlerrate auf dieser Datensammlung.
منابع مشابه
Invariant Image Object Recognition Using Mixture Densities
In this paper we present a mixture density based approach to invariant image object recognition. We start our experiments using Gaussian mixture densities within a Bayesian classifier. Invariance to affine transformations is achieved by replacing the Euclidean distance with SIMARD’s tangent distance. We propose an approach to estimating covariance matrices with respect to image invariances as w...
متن کاملImproving automatic speech recognition using tangent distance
In this paper we present a new approach to variance modelling in automatic speech recognition (ASR) that is based on tangent distance (TD). Using TD, classifiers can be made invariant w.r.t. small transformations of the data. Such transformations generate a manifold in a high dimensional feature space when applied to an observation vector. While conventional classifiers determine the distance b...
متن کاملMaximum Entropy and Gaussian Models for Image Object Recognition
The principle of maximum entropy is a powerful framework that can be used to estimate class posterior probabilities for pattern recognition tasks. In this paper, we show how this principle is related to the discriminative training of Gaussian mixture densities using the maximum mutual information criterion. This leads to a relaxation of the constraints on the covariance matrices to be positive ...
متن کاملGaussian Mixture Density Modeling, Decomposition, and Applications - Image Processing, IEEE Transactions on
AbstructGaussian mixture density modeling and decomposition is a classic yet challenging research topic. We present a new approach to the modeling and decomposition of Gaussian mixtures by using robust statistical methods. The mixture distribution is viewed as a (severely) contaminated Gaussian density. Using this model and the model-fitting (MF) estimator, we propose a recursive algorithm call...
متن کاملA Mixture Density Based Approach to Object Recognition for Image Retrieval
In the last few years, statistical classifiers based on Gaussian mixture densities proved to be very efficient for automatic speech recognition. The aim of this paper is to find out how well such a ‘conventional’ statistical classifier performs in the field of image object recognition (for future use within a content-based image retrieval system) We present a mixture density based Bayesian clas...
متن کاملEvolutionary Feature Selection for Probabilistic Object Recognition, Novel Object Detection and Object Saliency Estimation using GMMs
This paper presents a method for object recognition, novel object detection, and estimation of the most salient object within a set. Objects are sampled using a scale invariant region detector, and each region is characterized by the subset of texture and color descriptors selected by a Genetic Algorithm (GA). Using multiple views of an object, and multiple regions per view, objects are modeled...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2001